Discrimination and scoring using small sets of genes for two-sample microarray data.

نویسندگان

  • Gilles Guillot
  • Maja Olsson
  • Mikael Benson
  • Mats Rudemo
چکیده

Comparison of gene expression for two groups of individuals form an important subclass of microarray experiments. We study multivariate procedures, in particular use of Hotelling's T2 for discrimination between the groups with a special emphasis on methods based on few genes only. We apply the methods to data from an experiment with a group of atopic dermatitis patients compared with a control group. We also compare our methodology to other recently proposed methods on publicly available datasets. It is found that (i) use of several genes gives a much improved discrimination of the groups as compared to one gene only, (ii) the genes that play the most important role in the multivariate analysis are not necessarily those that rank first in univariate comparisons of the groups, (iii) Linear Discriminant Analysis carried out with sets of 2-5 genes selected according to their Hotelling T2 give results comparable to state-of-the-art methods using many more genes, a feature of our method which might be crucial in clinical applications. Finding groups of genes that together give optimal multivariate discrimination (given the size of the group) can identify crucial pathways and networks of genes responsible for a disease. The computer code that we developed to make computations is available as an R package.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest

Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...

متن کامل

Feature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach

Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

Selecting Genes with Dissimilar Discrimination Strength for Sample Class Prediction

One of the main applications of microarray technology is to determine the gene expression profiles of diseases and disease treatments. This is typically done by selecting a small number of genes from amongst thousands to tens of thousands, whose expression values are collectively used as classification profiles. This gene selection process is notoriously challenging because microarray data norm...

متن کامل

Identification of Alzheimer disease-relevant genes using a novel hybrid method

Identifying genes underlying complex diseases/traits that generally involve multiple etiological mechanisms and contributing genes is difficult. Although microarray technology has enabled researchers to investigate gene expression changes, but identifying pathobiologically relevant genes remains a challenge. To address this challenge, we apply a new method for selecting the disease-relevant gen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Mathematical biosciences

دوره 205 2  شماره 

صفحات  -

تاریخ انتشار 2007